Fix a failing Pipeline Flow - Instrumentation & Observability
## **Problem** We lack the instrumentation to answer two fundamental questions about the Fix Pipeline flow: 1. Is it delivering value to developers? 2. Is it behaving as designed? Without these metrics, we cannot measure whether the flow is saving developer time or resolving failures effectively. ## **Proposal** Build a clear, reusable instrumentation layer for the Fix CI/CD Pipeline with Duo flow that enables product, engineering, and data teams to measure adoption, performance, failure patterns, and step-level behavior — with an eye toward reusability across all Duo Workflow flows. ### Flow Impact Metrics Why: Measure the tangible time savings the feature delivers to developers, providing a direct indicator of business value and ROI to justify continued investment. | Priority | Metric | Target Visualization | Notes | |----------|--------|----------------------|-------| | HIGH | Average time from pipeline failure to green pipeline | Tableau | https://gitlab.com/gitlab-data/product-analytics/-/work_items/3211+ | | HIGH | Fix Code Suggestion Acceptance Rate | Tableau | New internal event needed - gitlab#598452 | | HIGH | GitLab Credits Used | Monetization Dashboard [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/DuoAgentPlatformMonetizationMetrics/BillableEvents?:iid=1) - SAFE | | | | | | | ### Standard Flow Metrics Why: Understand overall feature adoption and where users drop off to prioritize reliability and UX improvements. | Priority | Metric | Target Visualization | Notes | |----------|--------|----------------------|-------| | High | Number of times Fix Pipeline Flow was triggered | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | Available in Kibana | | High | Number of times Fix Pipeline Flow completed successfully | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | Available in Kibana | | High | Number of times Fix Pipeline flow Failed to complete | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | Available in Kibana | | Medium | Number of flows aborted by user | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | available in Kibana | | High | Conversion rate: Number of flows that resulted in a fix/Number of flows triggered | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | | | High | Conversion rate: Number of flows that results in a comment/Number of flows triggered | Fix Failing Pipeline Flow Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/FixPipelineFlowDashboard/FixPipelineFlowDashboard?:iid=1) | | | High | Converstion rate: Number of flows that results in an auto-retry/Number of flows triggered | Tableau | | | High | LLM calls per flow | Monetization Dashboard - [Tableau ](https://10az.online.tableau.com/#/site/gitlab/views/DuoAgentPlatformMonetizationMetrics/DuoAgentPlatformMonetizationInsights?:iid=1)(SAFE) | | | High | Token consumption per flow | Monetization Dashboard -[Tableau](https://10az.online.tableau.com/#/site/gitlab/views/DuoAgentPlatformMonetizationMetrics/TokenConsumptionMetrics?:iid=1) (SAFE) | | | High | Duration of the Flow | Product Adoption Dashboard - [Tableau](https://10az.online.tableau.com/#/site/gitlab/views/AgenticAIProductAdoption/Overview?:iid=1) | | ### Failure Classification Why: Categorize why flows fail to prioritize the highest-impact fixes and track whether the LLM is correctly scoping problems it can solve. <table> <tr> <th>Priority</th> <th>Metric</th> <th>Visualization</th> <th>Notes</th> </tr> <tr> <td>Medium</td> <td>Failure Reason/Category - Can we log this information based on the LLM reasoning</td> <td>Tableau</td> <td></td> </tr> <tr> <td>Medium</td> <td> Commonly suggested fix * Is it to retry the job, push a MR out, change the ci config </td> <td>Tableau</td> <td></td> </tr> </table> ### Flow Step Level Metrics (to implement the above) Why: Understand how the flow executes internally to identify bottlenecks, unexpected paths, and opportunities to improve the flow's decision-making. | Priority | Metric | Visualization | Notes | |----------|--------|---------------|-------| | Low | Step Duration | Kibana | | | Low | Step Status | Kibana | | | Medium | Step Failure Reason | Kibana | | ### **Filtering / Segmentation Dimensions** * GitLab project * Trigger type (DAP automation vs. manual) * Pipeline Source Type (Merge Request, Scheduled, Push etc.)
epic